Meta Llama 3 8B Instruct FP8 KV
The Meta-Llama-3-8B-Instruct model has undergone per-tensor quantization of FP8 weights and activations, suitable for inference with vLLM >= 0.5.0. This model checkpoint also includes per-tensor scaling parameters for FP8 quantized KV cache.
Large Language Model
Transformers